Modeling text with generalizable Gaussian mixtures
نویسندگان
چکیده
We apply and discuss generalizable Gaussian mixture (GGM) models for textmining. The model automatically adapts model complexity for a given text representation. We show that the generalizability of these models depends on the dimensionality of the representation and the sample size. We discuss the relation between supervised and unsupervised learning in text data. Finally, we implement a novelty detector based on the density model.
منابع مشابه
Modeling high-level information by using Gaussian mixture correlation for GMM-UBM based speaker recognition
The Gaussian mixture model-universal background model (GMM-UBM) has been dominant in text-independent speaker recognition tasks. However the conventional GMM-UBM method assumes that each Gaussian mixture is independent and ignores the fact that within Gaussian mixtures, there do exist some useful high-level speaker-dependent characteristics, such as word usage or speaking habits. Based on the G...
متن کاملModeling Audio and Visual Cues
Audio-visual event detection aims to identify semantically defined events that reveal human activities. Most previous literature focused on restricted highlight events, and depended on highly ad-hoc detectors for these events. This research emphasizes generalizable robust modeling of single-microphone audio cues and/or single-camera visual cues for the detection of real-world events, requiring ...
متن کاملErratum to: Local Statistical Modeling via a Cluster-Weighted Approach with Elliptical Distributions
Cluster-weighted modeling (CWM) is a mixture approach to modeling the joint probability of data coming from a heterogeneous population. Under Gaussian assumptions, we investigate statistical properties of CWM from both theoretical and numerical points of view; in particular, we show that Gaussian CWM includes mixtures of distributions and mixtures of regressions as special cases. Further, we in...
متن کاملComparison of Clustering Algorithms for Speaker Identification
In this paper we consider the problem of text-independent speaker identification that refers to acoustic recognition research. Many different techniques have been presented over past several decades. A stateof-the-art technique uses Gaussian Mixtures (GMM) for modeling speaker data distribution presented by MFCC [1] or LPCC [2] features. The classification is obtained by choosing the speaker cl...
متن کاملInfinite Dirichlet Mixtures in Text Modeling
This paper proposes a Dirichlet process mixture modeling approach to Dirichlet Mixtures (DM). Endowing a prior distribution on an infinite number of mixture components, this approach yields an appropriate number of components as well as their parameters at the same time. Experimental results on amino acid distributions and text corpora confirmed this effect and showed comparative performance on...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000